Zero-Shot Audio Classification Via Semantic Embeddings

نویسندگان

چکیده

In this paper, we study zero-shot learning in audio classification via semantic embeddings extracted from textual labels and sentence descriptions of sound classes. Our goal is to obtain a classifier that capable recognizing instances classes have no available training samples, but only side information. We employ bilinear compatibility framework learn an acoustic-semantic projection between intermediate-level representations classes, i.e., acoustic embeddings. use VGGish extract deep clips, pre-trained language models (Word2Vec, GloVe, BERT) generate either label or Audio performed by linear function measures how compatible embedding are. evaluate the proposed method on small balanced dataset ESC-50 large-scale unbalanced subset AudioSet. The experimental results show performance significantly improved involving are semantically close test training. Meanwhile, demonstrate both useful for learning. Classification concatenating label/sentence generated with different models. With their hybrid concatenations, further.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs

We consider the problem of zero-shot recognition: learning a visual classifier for a category with zero training examples, just using the word embedding of the category and its relationship to other categories, which visual data are provided. The key to dealing with the unfamiliar or novel category is to transfer knowledge obtained from familiar classes to describe the unfamiliar class. In this...

متن کامل

Zero-Shot Learning for Semantic Utterance Classification

We propose a novel zero-shot learning method for semantic utterance classification (SUC). It learns a classifier f : X → Y for problems where none of the semantic categories Y are present in the training set. The framework uncovers the link between categories and utterances through a semantic space. We show that this semantic space can be learned by deep neural networks trained on large amounts...

متن کامل

Probabilistic Zero-shot Classification with Semantic Rankings

In this paper we propose a non-metric rankingbased representation of semantic similarity that allows natural aggregation of semantic information from multiple heterogeneous sources. We apply the ranking-based representation to zeroshot learning problems, and present deterministic and probabilistic zero-shot classifiers which can be built from pre-trained classifiers without retraining. We demon...

متن کامل

Zero-Shot Learning by Convex Combination of Semantic Embeddings

Several recent publications have proposed methods for mapping images into continuous semantic embedding spaces. In some cases the embedding space is trained jointly with the image transformation. In other cases the semantic embedding space is established by an independent natural language processing task, and then the image transformation into that space is learned in a second stage. Proponents...

متن کامل

Neighborhood Sensitive Mapping for Zero-Shot Classification using Independently Learned Semantic Embeddings

In a traditional setting, classifiers are trained to approximate a target function f : X → Y where at least a sample for each y ∈ Y is presented to the training algorithm. In a zero-shot setting we have a subset of the labels Ŷ ⊂ Y for which we do not observe any corresponding training instance. Still, the function f that we train must be able to correctly assign labels also on Ŷ . In practice,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2021

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2021.3065234